{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Multivariate conditional independence testingwith `TIGRAMITE`\n",
    "\n",
    "TIGRAMITE is a time series analysis python module. It allows to reconstruct graphical models (conditional independence graphs) from discrete or continuously-valued time series based on the PCMCI framework and create high-quality plots of the results.\n",
    "\n",
    "This tutorial explains how the multivariate conditional independence test PairwiseMultCI works. For the theoretical details, we refer to\n",
    " - Tom Hochsprung, Jonas Wahl, Andreas Gerhardus, Urmi Ninad, and Jakob Runge.\n",
    "      Increasing Effect Sizes of Pairwise Conditional Independence Tests between Random Vectors. UAI2023, 2023.\n",
    "\n",
    "We imagine that such a multivariate test becomes relevant for vector-valued causal discovery, that is, causal discovery where certain variables are grouped together and one is only interested in learning causal relationsships between groups of variables.\n",
    "That research area is still active, for recent work, see\n",
    "- Wahl, J., Ninad, U., & Runge, J. (2023). Foundations of Causal Discovery on Groups of Variables. arXiv preprint arXiv:2306.07047.\n",
    "- Wahl, J., Ninad, U., & Runge, J. (2022). Vector causal inference between two groups of variables. arXiv preprint arXiv:2209.14283."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Imports\n",
    "import numpy as np\n",
    "from matplotlib import pyplot as plt\n",
    "\n",
    "import tigramite\n",
    "\n",
    "from tigramite.independence_tests.parcorr import ParCorr\n",
    "from tigramite.independence_tests.pairwise_CI import PairwiseMultCI"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Multivariate CI-Testing\n",
    "\n",
    "### Overview\n",
    "\n",
    "Suppose one has random vectors $X, Y, Z$ (each potentially multivariate) and suppose that one  wants to decide whether $X$ is independent of $Y$ given $Z$ (that is, the conditional densities (assuming they exist) factorize: $\\forall x,y,z:\\;p_{X,Y|Z=z}(x,y) = p_{X|Z=z}(x)\\cdot p_{Y|Z=z}(y)$.)\n",
    "\n",
    "One approach to test independence between $X$ and $Y$ given $Z$ is to test whether all components $X_i$ and $Y_j$ are independent given $Z$.\n",
    "The conditional independence test PairwiseMultCI builds on this pairwise idea. PairwiseMultCI tests independence between each pair of components $X_i$ and $Y_j$ given $(Z,S_{ij}).$ Here, $S_{ij}$ is a set that consists of components of either $X$ or $Y$ that are independent of $Y_j$ or $X_i$. One can show that including these components increases the effect size of the corresponding test. To choose between the pairwise independence testing with and without increased conditioning sets, there is the boolean parameter ``learn_augmented_cond_sets``. Setting it to False will lead to pairwise independence testing without increased conditioning sets, setting it to true will lead to increased conditioning sets.\n",
    "\n",
    "In practice, these sets $S_{ij}$ are usually not known a priori and hence need to be learned.\n",
    "PairwiseMultCI first learns these conditional independencies on one part of the sample and then uses these conditional independencies to do the testing on the second part of the sample. The sample splitting ratio (``pre_step_sample_fraction``) is a hyperparameter, as is the significance level ($\\alpha_{pre}$) for learning independencies in the first step. (We think that only a small part of the sample should be used for learning independencies in the first step, moreover, $\\alpha_{pre}$ should be rather large to only mark something as independent if there is strong evidence). As a small side remark, setting ``pre_step_sample_fraction`` to $0$ is the same as setting ``learn_augmented_cond_sets`` to False.\n",
    "\n",
    "We remark that PairwiseMultCI is a \"meta-algorithm\", that is, it is a framework that works with several univariate test statistics. Because of that, PairwiseMultCI also has the parameter cond_ind_test, which can be any other conditional independence test from tigramite/independence_tests"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Simulations - Part 1\n",
    "\n",
    "We start by generating a dataset where $X$ and $Z$ are univariate, and $Y=(Y_1,Y_2)$ is bivariate."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "T = 100\n",
    "z = np.random.normal(0, 1, T). reshape(T, 1)\n",
    "x = np.random.normal(0, 1, T). reshape(T, 1) + 0.4 * z\n",
    "y1 = np.random.normal(0, 1, T). reshape(T, 1) + 0.2 * z\n",
    "y2 = y1 + 0.3 * np.random.normal(0, 1, T). reshape(T, 1) + 0.5 * x\n",
    "y = np.hstack((y1, y2))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We then print out the maximum of the univariate test statistics and the aggregated p-value for PairwiseMultCI as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "test_stat 0.5344564686126332\n",
      "pval 7.821721152723775e-07\n"
     ]
    }
   ],
   "source": [
    "# alpha_pre: significance level for the first step\n",
    "# pre_step_sample_fraction: relative size of the sample for the first step\n",
    "# cond_ind_test: respective univariate conditional independence test\n",
    "ci_test = PairwiseMultCI(learn_augmented_cond_sets = True, alpha_pre = 0.5, pre_step_sample_fraction = 0.2, cond_ind_test = ParCorr())\n",
    "test_stat, pval = ci_test.run_test_raw(x = x, y = y, z = z)\n",
    "print(\"test_stat\", test_stat)\n",
    "print(\"pval\", pval)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Simulations - Part 2\n",
    "\n",
    "We now illustrate that PairwiseMultCI performs especially well relative to other approaches when the within-$Y$ (or within-$X$) dependence is strong. For that we consider similar models as above and calculate the rejection rate over $100$ replications. We also include one competitor  approach that just tests pairwise independencies (as explained in the second paragraph of the Overview-Section)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Large within-$Y$ dependence:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Rejection Rate PairwiseMultCI: 0.53\n",
      "Rejection Rate Competitor: 0.11\n"
     ]
    }
   ],
   "source": [
    "np.random.seed(2023)\n",
    "T = 100\n",
    "# number replications:\n",
    "repls = 100\n",
    "alpha = 0.05\n",
    "rejection_bool = np.zeros(repls)\n",
    "rejection_bool_competitor = np.zeros(repls)\n",
    "par_corr = ParCorr()\n",
    "for i in np.arange(0, repls):\n",
    "    # data generation:\n",
    "    z = np.random.normal(0, 1, T). reshape(T, 1)\n",
    "    x = np.random.normal(0, 1, T). reshape(T, 1) + 0.4 * z\n",
    "    y1 = np.random.normal(0, 1, T). reshape(T, 1) + 0.2 * z\n",
    "    y2 = y1 + 0.3 * np.random.normal(0, 1, T). reshape(T, 1) + 0.1 * x\n",
    "    y = np.hstack((y1, y2))\n",
    "    # PairwiseMultCI\n",
    "    ci_test = PairwiseMultCI(learn_augmented_cond_sets = True, alpha_pre = 0.5, pre_step_sample_fraction = 0.2, cond_ind_test = par_corr)\n",
    "    test_stat1, pval1 = ci_test.run_test_raw(x = x, y = y, z = z)\n",
    "    if (pval1 <= alpha):\n",
    "        rejection_bool[i] = 1  \n",
    "           \n",
    "    ## competitor:\n",
    "    ci_test_competitor = PairwiseMultCI()\n",
    "    test_stat2, pval2 = ci_test_competitor.run_test_raw(x = x, y = y, z = z)\n",
    "    if (pval2 <= alpha):\n",
    "        rejection_bool_competitor[i] = 1  \n",
    "print(\"Rejection Rate PairwiseMultCI:\", np.mean(rejection_bool))\n",
    "print(\"Rejection Rate Competitor:\",np.mean(rejection_bool_competitor))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We see that PairwiseMultCI strongly outperforms the competitor."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### No within-$Y$ dependence:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Rejection Rate PairwiseMultCI: 0.77\n",
      "Rejection Rate Competitor: 0.85\n"
     ]
    }
   ],
   "source": [
    "np.random.seed(2023)\n",
    "T = 100\n",
    "# number replications:\n",
    "repls = 100\n",
    "alpha = 0.05\n",
    "rejection_bool = np.zeros(repls)\n",
    "rejection_bool_competitor = np.zeros(repls)\n",
    "par_corr = ParCorr()\n",
    "for i in np.arange(0, repls):\n",
    "    # data generation:\n",
    "    z = np.random.normal(0, 1, T). reshape(T, 1)\n",
    "    x = np.random.normal(0, 1, T). reshape(T, 1) + 0.4 * z\n",
    "    y1 = np.random.normal(0, 1, T). reshape(T, 1) + 0.2 * z\n",
    "    y2 = 0.3 * np.random.normal(0, 1, T). reshape(T, 1) + 0.1 * x\n",
    "    y = np.hstack((y1, y2))\n",
    "    # PairwiseMultCI\n",
    "    ci_test = PairwiseMultCI(learn_augmented_cond_sets = True, alpha_pre = 0.5, pre_step_sample_fraction = 0.2, cond_ind_test = par_corr)\n",
    "    test_stat1, pval1 = ci_test.run_test_raw(x = x, y = y, z = z)\n",
    "    if (pval1 <= alpha):\n",
    "        rejection_bool[i] = 1  \n",
    "           \n",
    "    ## competitor:\n",
    "    ci_test_competitor = PairwiseMultCI()\n",
    "    test_stat2, pval2 = ci_test_competitor.run_test_raw(x = x, y = y, z = z)\n",
    "    if (pval2 <= alpha):\n",
    "        rejection_bool_competitor[i] = 1  \n",
    "print(\"Rejection Rate PairwiseMultCI:\", np.mean(rejection_bool))\n",
    "print(\"Rejection Rate Competitor:\",np.mean(rejection_bool_competitor))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We see that in case of no dependence between $Y_1$ and $Y_2$, PairwiseMultCI performs slightly worse than the competitor."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Remark:\n",
    "\n",
    "It is also possible to instantiate PairwiseMultCI with the option significance = \"fixed_thres\". Strictly speaking, setting the significance of the cond_ind_test_object is something else, however, at the moment, we will work with fixed_thres if at least at one occasion significance = fixed_thres. When working with fixed_thres, instead of working with the significance level $\\alpha_{pre},$ one needs to specify a threshold for the pre-step, called fixed_thres_pres. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Conclusion:\n",
    "\n",
    "- PairwiseMultCI is only relevant when $X$ and/or $Y$ are multivariate.\n",
    "- PairwiseMultCI relies on the underlying assumption that independence is only violated for a few components $X_i$ and $Y_j$.\n",
    "- PairwiseMultCI works especially well (relative to other approaches) when the within-$X$ or within-$Y$ dependence is large."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "tigenv",
   "language": "python",
   "name": "tigenv"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
